Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance tests #226

Merged
merged 17 commits into from
Oct 30, 2024
Merged

Enhance tests #226

merged 17 commits into from
Oct 30, 2024

Conversation

Harry-Chen
Copy link
Contributor

Modifications include:

  • register all tests with ctest framework, so we do not need to call test binaries manually
  • merge linux and macos CI config files, use strategy matrix to eliminate redundancy
  • use built-in toolchains of runner image instead installing new ones (they do have some issues -- see actions history in my forked repo for details)
  • allow CI to be triggered manually (workflow_dispatch)

New CI runs in my fork:

@liuzicheng1987
Copy link
Contributor

@Harry-Chen , thanks for the initiative! Seems like a great PR!

Do you have any idea why this failed for GCC 12? We used to test on GCC 11 and it always worked fine. Also, it seems to have failed on your account as well...

@liuzicheng1987
Copy link
Contributor

@Harry-Chen , obviously the failure didn't even happen during the tests, it happened during the benchmarks, so it is likely that it is due to a bug in one of the libraries we use for benchmarking, not our own library.

Could you just try changing this to GCC 11?

@Harry-Chen
Copy link
Contributor Author

Could you just try changing this to GCC 11?

I can reproduce the segmentation fault locally with GCC 12. I'd prefer trying to fix it first, and treat changing back to GCC 11 as the last resort.

@Harry-Chen
Copy link
Contributor Author

This SIGSEGV goes away when compiling with address sanitizer.

Valgrind says:

==3542836== Use of uninitialised value of size 8
==3542836==    at 0x17EDC1: move_to<rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}> > (ViewReader.hpp:128)
==3542836==    by 0x17EDC1: assign_if_field_matches<0, rfl::json::Reader::YYJSONInputVar, rfl::NamedTuple<rfl::Field<rfl::internal::StringLiteral<5>{std::array<char, 5>{"type"}}, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*>, rfl::Field<rfl::internal::StringLiteral<9>{std::array<char, 9>{"features"}}, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*> >, std::vector<rfl::Error>, std::array<bool, 2>, std::array<bool, 2> > (ViewReader.hpp:61)
==3542836==    by 0x17EDC1: assign_to_matching_field<0, 1, rfl::json::Reader::YYJSONInputVar, rfl::NamedTuple<rfl::Field<rfl::internal::StringLiteral<5>{std::array<char, 5>{"type"}}, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*>, rfl::Field<rfl::internal::StringLiteral<9>{std::array<char, 9>{"features"}}, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*> >, std::vector<rfl::Error>, std::array<bool, 2>, std::array<bool, 2> > (ViewReader.hpp:102)
==3542836==    by 0x17EDC1: read (ViewReader.hpp:33)

... and ...

==3542836== Invalid write of size 1
==3542836==    at 0x17EDC1: move_to<rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}> > (ViewReader.hpp:128)
==3542836==    by 0x17EDC1: assign_if_field_matches<0, rfl::json::Reader::YYJSONInputVar, rfl::NamedTuple<rfl::Field<rfl::internal::StringLiteral<5>{std::array<char, 5>{"type"}}, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*>, rfl::Field<rfl::internal::StringLiteral<9>{std::array<char, 9>{"features"}}, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*> >, std::vector<rfl::Error>, std::array<bool, 2>, std::array<bool, 2> > (ViewReader.hpp:61)
==3542836==    by 0x17EDC1: assign_to_matching_field<0, 1, rfl::json::Reader::YYJSONInputVar, rfl::NamedTuple<rfl::Field<rfl::internal::StringLiteral<5>{std::array<char, 5>{"type"}}, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*>, rfl::Field<rfl::internal::StringLiteral<9>{std::array<char, 9>{"features"}}, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*> >, std::vector<rfl::Error>, std::array<bool, 2>, std::array<bool, 2> > (ViewReader.hpp:102)
==3542836==    by 0x17EDC1: read (ViewReader.hpp:33)

So there is a hidden illegal write that should be fixed. It just happens not to crash with other compilers, I believe.

@Harry-Chen
Copy link
Contributor Author

@liuzicheng1987 I do not think it is a problem with 3rd party libraries, but within ViewReader.

GDB says the last frame is:

move_to<rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}> > (_s=0x7fffffffa658, _t=0x7b7b210000000000)

It runs placement new: ::new (_t) Target(std::move(*_s));. Since _t is obviously a valid pointer, it fails.

(gdb) p *_view
$1 = {static seq_ = {<No data fields>}, values_ = {static size_ = 2, static positions_ = <same as static member of an already seen type>, static seq_ = <same as static member of an already seen type>, static num_bytes_ = 16, data_ = {
      _M_elems = "\000\000\000\000\000!{{DY\202\367\377\177\000"}}}
(gdb) ptype _view
type = class rfl::NamedTuple<rfl::Field<rfl::internal::StringLiteral<5>{std::array<char, 5>{"type"}}, rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*>, rfl::Field<rfl::internal::StringLiteral<9>{std::array<char, 9>{"features"}}, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*> >

(gdb) p _view->values_
$4 = {static size_ = 2, static positions_ = {_M_elems = {0, 8, 16}}, static seq_ = <same as static member of an already seen type>, static num_bytes_ = 16, data_ = {_M_elems = "\000\000\000\000\000!{{DY\202\367\377\177\000"}}
(gdb) ptype _view->values_
type = class rfl::Tuple<rfl::Literal<rfl::internal::StringLiteral<18>{std::array<char, 18>{"FeatureCollection"}}>*, std::vector<canada_read::Feature, std::allocator<canada_read::Feature> >*>

So t=0x7b7b210000000000 comes from _view->values_->data[0:8] ("\000\000\000\000\000\21\7b\7b") cast as uint64_t.

After tracing down, the _view used comes from:

static Result<T> read_struct(const R& _r, const InputVarType& _var) {
alignas(T) unsigned char buf[sizeof(T)];
auto ptr = std::launder(reinterpret_cast<T*>(buf));
auto view = ProcessorsType::template process<T>(to_view(*ptr));
using ViewType = std::remove_cvref_t<decltype(view)>;
const auto [set, err] =
Parser<R, W, ViewType, ProcessorsType>::read_view(_r, _var, &view);
if (err) [[unlikely]] {
call_destructors_where_necessary(set, &view);
return *err;
}
auto res = Result<T>(std::move(*ptr));
call_destructors_where_necessary(set, &view);
return res;
}

Along with Valgrind output, we can see the execution chain:

  1. _view is originally a static buffer on stack (with random content).
  2. it is converted to a NamedTuple, whose values_ contains variant type of data.
  3. the first 8 bytes of values_ is converted to a pointer and passed to move_to (thus Use of uninitialised value of size 8).
  4. move_to writes to this random pointer (thus Invalid write of size 1).

I do not currently have the ability to dig further, since I am not familiar with the complex templates used by reflect-cpp.
But I think this information is enough for any core developer to craft a fix.

@liuzicheng1987
Copy link
Contributor

Thanks so much, I will take a look.

@Harry-Chen
Copy link
Contributor Author

reflect-cpp-all-format-benchmarks.gz

And here's a statically-linked binary that could 100% reproduce the issue, should you need it.

@liuzicheng1987
Copy link
Contributor

@Harry-Chen, thanks for bringing this to my attention. The problem is quite a bit deeper and more complex than originally thought, but I will write a fix as quickly as I can. Please give me a couple of days, it requires some deeper changes.

@liuzicheng1987
Copy link
Contributor

@Harry-Chen, but I will certainly merge your PR once I am done. I think your PR is great! Thank you so much!

@Harry-Chen
Copy link
Contributor Author

@Harry-Chen, thanks for bringing this to my attention. The problem is quite a bit deeper and more complex than originally thought, but I will write a fix as quickly as I can. Please give me a couple of days, it requires some deeper changes.

Sure, take your time. Writing correct C++ is never easy 🤦

@liuzicheng1987
Copy link
Contributor

@Harry-Chen , the problem should be resolved with 29c4201.

Could you merge the main branch and add a separate test for GCC 11? The tests should run through for that one, and then I would be too happy to merge your PR.

Signed-off-by: Shengqi Chen <[email protected]>
THe newly-installed versions have some troublesome issues.

Signed-off-by: Shengqi Chen <[email protected]>
Signed-off-by: Shengqi Chen <[email protected]>
@Harry-Chen
Copy link
Contributor Author

@liuzicheng1987 I added gcc-11 as requested. Additionally I added ccache to speed up compilation when possible.

@liuzicheng1987
Copy link
Contributor

Great work! Thank you so much!

@liuzicheng1987 liuzicheng1987 merged commit 7bf71c6 into getml:main Oct 30, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants